Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 1241 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 174.6 KiB |
| Average record size in memory | 144.1 B |
Variable types
| Categorical | 4 |
|---|---|
| Numeric | 14 |
Village has a high cardinality: 1210 distinct values | High cardinality |
pH is highly correlated with WQI | High correlation |
EC is highly correlated with TDS and 10 other fields | High correlation |
TDS is highly correlated with EC and 10 other fields | High correlation |
TH is highly correlated with EC and 10 other fields | High correlation |
Alkalinity is highly correlated with EC and 8 other fields | High correlation |
Calcium is highly correlated with EC and 6 other fields | High correlation |
Magnesium is highly correlated with EC and 9 other fields | High correlation |
Sodium is highly correlated with EC and 9 other fields | High correlation |
Bicarbonate is highly correlated with EC and 8 other fields | High correlation |
Chloride is highly correlated with EC and 7 other fields | High correlation |
Sulphate is highly correlated with EC and 8 other fields | High correlation |
Fluoride is highly correlated with WQI | High correlation |
is_drinkable is highly correlated with EC and 9 other fields | High correlation |
WQI is highly correlated with pH and 10 other fields | High correlation |
EC is highly correlated with TDS and 10 other fields | High correlation |
TDS is highly correlated with EC and 10 other fields | High correlation |
TH is highly correlated with EC and 8 other fields | High correlation |
Alkalinity is highly correlated with EC and 7 other fields | High correlation |
Calcium is highly correlated with EC and 3 other fields | High correlation |
Magnesium is highly correlated with EC and 5 other fields | High correlation |
Sodium is highly correlated with EC and 5 other fields | High correlation |
Potassium is highly correlated with WQI | High correlation |
Bicarbonate is highly correlated with EC and 7 other fields | High correlation |
Chloride is highly correlated with EC and 6 other fields | High correlation |
Sulphate is highly correlated with EC and 4 other fields | High correlation |
Fluoride is highly correlated with WQI | High correlation |
is_drinkable is highly correlated with EC and 5 other fields | High correlation |
WQI is highly correlated with EC and 6 other fields | High correlation |
EC is highly correlated with TDS and 10 other fields | High correlation |
TDS is highly correlated with EC and 10 other fields | High correlation |
TH is highly correlated with EC and 7 other fields | High correlation |
Alkalinity is highly correlated with EC and 6 other fields | High correlation |
Calcium is highly correlated with EC and 2 other fields | High correlation |
Magnesium is highly correlated with EC and 5 other fields | High correlation |
Sodium is highly correlated with EC and 3 other fields | High correlation |
Bicarbonate is highly correlated with EC and 5 other fields | High correlation |
Chloride is highly correlated with EC and 4 other fields | High correlation |
Sulphate is highly correlated with EC and 2 other fields | High correlation |
Fluoride is highly correlated with WQI | High correlation |
is_drinkable is highly correlated with EC and 7 other fields | High correlation |
WQI is highly correlated with EC and 4 other fields | High correlation |
is_drinkable is highly correlated with WQC | High correlation |
WQC is highly correlated with is_drinkable | High correlation |
District is highly correlated with pH and 3 other fields | High correlation |
pH is highly correlated with District | High correlation |
EC is highly correlated with TDS and 12 other fields | High correlation |
TDS is highly correlated with EC and 12 other fields | High correlation |
TH is highly correlated with EC and 8 other fields | High correlation |
Alkalinity is highly correlated with District and 11 other fields | High correlation |
Calcium is highly correlated with EC and 4 other fields | High correlation |
Magnesium is highly correlated with EC and 9 other fields | High correlation |
Sodium is highly correlated with EC and 7 other fields | High correlation |
Potassium is highly correlated with EC and 5 other fields | High correlation |
Bicarbonate is highly correlated with District and 11 other fields | High correlation |
Chloride is highly correlated with EC and 7 other fields | High correlation |
Sulphate is highly correlated with EC and 4 other fields | High correlation |
Fluoride is highly correlated with Alkalinity and 4 other fields | High correlation |
is_drinkable is highly correlated with EC and 9 other fields | High correlation |
WQI is highly correlated with EC and 10 other fields | High correlation |
WQC is highly correlated with District and 9 other fields | High correlation |
Village is uniformly distributed | Uniform |
WQI has unique values | Unique |
Sulphate has 82 (6.6%) zeros | Zeros |
Reproduction
| Analysis started | 2022-07-24 13:23:20.408365 |
|---|---|
| Analysis finished | 2022-07-24 13:24:24.568987 |
| Duration | 1 minute and 4.16 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 30 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| Ganjam | 81 |
|---|---|
| Sambalpur | 79 |
| Mayurbhanj | 75 |
| Sundargarh | 68 |
| Bargarh | 68 |
| Other values (25) |
Length
| Max length | 13 |
|---|---|
| Median length | 10 |
| Mean length | 7.707493956 |
| Min length | 4 |
Characters and Unicode
| Total characters | 9565 |
|---|---|
| Distinct characters | 32 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Angul |
|---|---|
| 2nd row | Angul |
| 3rd row | Angul |
| 4th row | Angul |
| 5th row | Angul |
Common Values
| Value | Count | Frequency (%) |
| Ganjam | 81 | 6.5% |
| Sambalpur | 79 | 6.4% |
| Mayurbhanj | 75 | 6.0% |
| Sundargarh | 68 | 5.5% |
| Bargarh | 68 | 5.5% |
| Kendujhar | 64 | 5.2% |
| Puri | 62 | 5.0% |
| Koraput | 62 | 5.0% |
| Cuttack | 60 | 4.8% |
| Khordha | 60 | 4.8% |
| Other values (20) | 562 |
Length
| Value | Count | Frequency (%) |
| ganjam | 81 | 6.5% |
| sambalpur | 79 | 6.4% |
| mayurbhanj | 75 | 6.0% |
| sundargarh | 68 | 5.5% |
| bargarh | 68 | 5.5% |
| kendujhar | 64 | 5.2% |
| puri | 62 | 5.0% |
| koraput | 62 | 5.0% |
| cuttack | 60 | 4.8% |
| khordha | 60 | 4.8% |
| Other values (20) | 562 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2030 | |
| r | 1022 | 10.7% |
| u | 731 | 7.6% |
| h | 617 | 6.5% |
| n | 617 | 6.5% |
| d | 396 | 4.1% |
| g | 368 | 3.8% |
| p | 332 | 3.5% |
| l | 299 | 3.1% |
| j | 280 | 2.9% |
| Other values (22) | 2873 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 8324 | |
| Uppercase Letter | 1241 | 13.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2030 | |
| r | 1022 | |
| u | 731 | 8.8% |
| h | 617 | 7.4% |
| n | 617 | 7.4% |
| d | 396 | 4.8% |
| g | 368 | 4.4% |
| p | 332 | 4.0% |
| l | 299 | 3.6% |
| j | 280 | 3.4% |
| Other values (10) | 1632 |
Uppercase Letter
| Value | Count | Frequency (%) |
| K | 259 | |
| B | 204 | |
| S | 199 | |
| G | 113 | |
| M | 96 | 7.7% |
| N | 93 | 7.5% |
| P | 62 | 5.0% |
| C | 60 | 4.8% |
| J | 49 | 3.9% |
| A | 45 | 3.6% |
| Other values (2) | 61 | 4.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9565 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 2030 | |
| r | 1022 | 10.7% |
| u | 731 | 7.6% |
| h | 617 | 6.5% |
| n | 617 | 6.5% |
| d | 396 | 4.1% |
| g | 368 | 3.8% |
| p | 332 | 3.5% |
| l | 299 | 3.1% |
| j | 280 | 2.9% |
| Other values (22) | 2873 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 9565 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2030 | |
| r | 1022 | 10.7% |
| u | 731 | 7.6% |
| h | 617 | 6.5% |
| n | 617 | 6.5% |
| d | 396 | 4.1% |
| g | 368 | 3.8% |
| p | 332 | 3.5% |
| l | 299 | 3.1% |
| j | 280 | 2.9% |
| Other values (22) | 2873 |
| Distinct | 1210 |
|---|---|
| Distinct (%) | 97.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| Nuagaon | 4 |
|---|---|
| Jagannathpur | 3 |
| Kharmanda | 3 |
| Sakhigopal | 2 |
| Gosala | 2 |
| Other values (1205) |
Length
| Max length | 28 |
|---|---|
| Median length | 23 |
| Mean length | 9.185334408 |
| Min length | 3 |
Characters and Unicode
| Total characters | 11399 |
|---|---|
| Distinct characters | 64 |
| Distinct categories | 9 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1183 ? |
|---|---|
| Unique (%) | 95.3% |
Sample
| 1st row | Chauliakata |
|---|---|
| 2nd row | Godibandha |
| 3rd row | Samal |
| 4th row | Sipur |
| 5th row | Khamar-1 |
Common Values
| Value | Count | Frequency (%) |
| Nuagaon | 4 | 0.3% |
| Jagannathpur | 3 | 0.2% |
| Kharmanda | 3 | 0.2% |
| Sakhigopal | 2 | 0.2% |
| Gosala | 2 | 0.2% |
| Sikharpur | 2 | 0.2% |
| Indupur | 2 | 0.2% |
| Choudwar | 2 | 0.2% |
| Harbhanga | 2 | 0.2% |
| Usbelika | 2 | 0.2% |
| Other values (1200) | 1217 |
Length
| Value | Count | Frequency (%) |
| 1 | 30 | 2.1% |
| nagar | 9 | 0.6% |
| chhak | 7 | 0.5% |
| 2 | 6 | 0.4% |
| nuagaon | 5 | 0.3% |
| bazar | 5 | 0.3% |
| road | 4 | 0.3% |
| chawk | 3 | 0.2% |
| kharmanda | 3 | 0.2% |
| temple | 3 | 0.2% |
| Other values (1309) | 1358 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2452 | |
| i | 813 | 7.1% |
| r | 745 | 6.5% |
| u | 678 | 5.9% |
| n | 642 | 5.6% |
| h | 549 | 4.8% |
| d | 493 | 4.3% |
| l | 449 | 3.9% |
| p | 410 | 3.6% |
| g | 325 | 2.9% |
| Other values (54) | 3843 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9418 | |
| Uppercase Letter | 1400 | 12.3% |
| Decimal Number | 229 | 2.0% |
| Space Separator | 190 | 1.7% |
| Dash Punctuation | 113 | 1.0% |
| Other Punctuation | 15 | 0.1% |
| Close Punctuation | 15 | 0.1% |
| Open Punctuation | 15 | 0.1% |
| Control | 4 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2452 | |
| i | 813 | 8.6% |
| r | 745 | 7.9% |
| u | 678 | 7.2% |
| n | 642 | 6.8% |
| h | 549 | 5.8% |
| d | 493 | 5.2% |
| l | 449 | 4.8% |
| p | 410 | 4.4% |
| g | 325 | 3.5% |
| Other values (16) | 1862 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 242 | |
| K | 170 | |
| S | 133 | |
| R | 96 | 6.9% |
| G | 86 | 6.1% |
| D | 86 | 6.1% |
| P | 84 | 6.0% |
| J | 77 | 5.5% |
| M | 73 | 5.2% |
| C | 68 | 4.9% |
| Other values (12) | 285 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 84 | |
| 2 | 37 | |
| 3 | 27 | 11.8% |
| 0 | 18 | 7.9% |
| 4 | 17 | 7.4% |
| 7 | 11 | 4.8% |
| 5 | 11 | 4.8% |
| 9 | 10 | 4.4% |
| 8 | 7 | 3.1% |
| 6 | 7 | 3.1% |
Space Separator
| Value | Count | Frequency (%) |
| 190 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 113 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 15 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 15 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 15 |
Control
| Value | Count | Frequency (%) |
| 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 10818 | |
| Common | 581 | 5.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 2452 | |
| i | 813 | 7.5% |
| r | 745 | 6.9% |
| u | 678 | 6.3% |
| n | 642 | 5.9% |
| h | 549 | 5.1% |
| d | 493 | 4.6% |
| l | 449 | 4.2% |
| p | 410 | 3.8% |
| g | 325 | 3.0% |
| Other values (38) | 3262 |
Common
| Value | Count | Frequency (%) |
| 190 | ||
| - | 113 | |
| 1 | 84 | |
| 2 | 37 | 6.4% |
| 3 | 27 | 4.6% |
| 0 | 18 | 3.1% |
| 4 | 17 | 2.9% |
| . | 15 | 2.6% |
| ) | 15 | 2.6% |
| ( | 15 | 2.6% |
| Other values (6) | 50 | 8.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11399 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2452 | |
| i | 813 | 7.1% |
| r | 745 | 6.5% |
| u | 678 | 5.9% |
| n | 642 | 5.6% |
| h | 549 | 4.8% |
| d | 493 | 4.3% |
| l | 449 | 3.9% |
| p | 410 | 3.6% |
| g | 325 | 2.9% |
| Other values (54) | 3843 |
| Distinct | 185 |
|---|---|
| Distinct (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.828622079 |
| Minimum | 6.46 |
|---|---|
| Maximum | 8.78 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 6.46 |
|---|---|
| 5-th percentile | 7.07 |
| Q1 | 7.58 |
| median | 7.9 |
| Q3 | 8.12 |
| 95-th percentile | 8.35 |
| Maximum | 8.78 |
| Range | 2.32 |
| Interquartile range (IQR) | 0.54 |
Descriptive statistics
| Standard deviation | 0.3996080174 |
|---|---|
| Coefficient of variation (CV) | 0.05104448948 |
| Kurtosis | -0.1158318734 |
| Mean | 7.828622079 |
| Median Absolute Deviation (MAD) | 0.26 |
| Skewness | -0.6268527131 |
| Sum | 9715.32 |
| Variance | 0.1596865675 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8.3 | 45 | 3.6% |
| 7.9 | 20 | 1.6% |
| 8.2 | 20 | 1.6% |
| 8.02 | 19 | 1.5% |
| 8.1 | 19 | 1.5% |
| 8.08 | 19 | 1.5% |
| 8 | 18 | 1.5% |
| 7.94 | 18 | 1.5% |
| 7.98 | 17 | 1.4% |
| 7.88 | 16 | 1.3% |
| Other values (175) | 1030 |
| Value | Count | Frequency (%) |
| 6.46 | 1 | |
| 6.5 | 1 | |
| 6.54 | 1 | |
| 6.6 | 1 | |
| 6.64 | 1 | |
| 6.71 | 1 | |
| 6.73 | 1 | |
| 6.75 | 1 | |
| 6.77 | 2 | |
| 6.81 | 1 |
| Value | Count | Frequency (%) |
| 8.78 | 1 | |
| 8.63 | 1 | |
| 8.62 | 1 | |
| 8.61 | 1 | |
| 8.6 | 1 | |
| 8.59 | 2 | |
| 8.58 | 1 | |
| 8.56 | 2 | |
| 8.55 | 1 | |
| 8.54 | 1 |
| Distinct | 249 |
|---|---|
| Distinct (%) | 20.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 695.4392828 |
| Minimum | 7.15 |
|---|---|
| Maximum | 5770 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 7.15 |
|---|---|
| 5-th percentile | 150 |
| Q1 | 360 |
| median | 550 |
| Q3 | 900 |
| 95-th percentile | 1610 |
| Maximum | 5770 |
| Range | 5762.85 |
| Interquartile range (IQR) | 540 |
Descriptive statistics
| Standard deviation | 536.8190624 |
|---|---|
| Coefficient of variation (CV) | 0.7719136317 |
| Kurtosis | 13.86554926 |
| Mean | 695.4392828 |
| Median Absolute Deviation (MAD) | 250 |
| Skewness | 2.740791102 |
| Sum | 863040.15 |
| Variance | 288174.7058 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 440 | 25 | 2.0% |
| 500 | 23 | 1.9% |
| 400 | 20 | 1.6% |
| 650 | 20 | 1.6% |
| 290 | 19 | 1.5% |
| 430 | 19 | 1.5% |
| 460 | 18 | 1.5% |
| 340 | 18 | 1.5% |
| 450 | 18 | 1.5% |
| 470 | 18 | 1.5% |
| Other values (239) | 1043 |
| Value | Count | Frequency (%) |
| 7.15 | 1 | 0.1% |
| 55 | 1 | 0.1% |
| 60 | 4 | 0.3% |
| 70 | 1 | 0.1% |
| 75 | 2 | 0.2% |
| 80 | 3 | 0.2% |
| 87 | 1 | 0.1% |
| 90 | 4 | 0.3% |
| 100 | 10 | |
| 105 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 5770 | 1 | |
| 4450 | 1 | |
| 4420 | 1 | |
| 3740 | 1 | |
| 3720 | 1 | |
| 3470 | 1 | |
| 3310 | 1 | |
| 3140 | 1 | |
| 3050 | 1 | |
| 3030 | 1 |
| Distinct | 601 |
|---|---|
| Distinct (%) | 48.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 358.0572119 |
| Minimum | 30 |
|---|---|
| Maximum | 2766 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 30 |
|---|---|
| 5-th percentile | 82 |
| Q1 | 186 |
| median | 277 |
| Q3 | 456 |
| 95-th percentile | 843 |
| Maximum | 2766 |
| Range | 2736 |
| Interquartile range (IQR) | 270 |
Descriptive statistics
| Standard deviation | 280.9793428 |
|---|---|
| Coefficient of variation (CV) | 0.7847330914 |
| Kurtosis | 13.07617928 |
| Mean | 358.0572119 |
| Median Absolute Deviation (MAD) | 123 |
| Skewness | 2.742833455 |
| Sum | 444349 |
| Variance | 78949.39108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 7 | 0.6% |
| 99 | 7 | 0.6% |
| 207 | 7 | 0.6% |
| 223 | 7 | 0.6% |
| 200 | 7 | 0.6% |
| 204 | 7 | 0.6% |
| 262 | 6 | 0.5% |
| 183 | 6 | 0.5% |
| 267 | 6 | 0.5% |
| 227 | 6 | 0.5% |
| Other values (591) | 1175 |
| Value | Count | Frequency (%) |
| 30 | 1 | |
| 35 | 1 | |
| 36 | 1 | |
| 40 | 1 | |
| 41 | 1 | |
| 42 | 1 | |
| 43 | 2 | |
| 45 | 2 | |
| 46 | 2 | |
| 47 | 2 |
| Value | Count | Frequency (%) |
| 2766 | 1 | |
| 2560 | 1 | |
| 2335 | 1 | |
| 1914 | 1 | |
| 1872 | 1 | |
| 1858 | 1 | |
| 1769 | 1 | |
| 1682 | 1 | |
| 1606 | 1 | |
| 1586 | 1 |
| Distinct | 379 |
|---|---|
| Distinct (%) | 30.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 215.0153102 |
| Minimum | 20 |
|---|---|
| Maximum | 1945 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 53 |
| Q1 | 123 |
| median | 184 |
| Q3 | 267 |
| 95-th percentile | 466 |
| Maximum | 1945 |
| Range | 1925 |
| Interquartile range (IQR) | 144 |
Descriptive statistics
| Standard deviation | 156.7872729 |
|---|---|
| Coefficient of variation (CV) | 0.729191204 |
| Kurtosis | 30.89449597 |
| Mean | 215.0153102 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | 3.965647141 |
| Sum | 266834 |
| Variance | 24582.24896 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 158 | 15 | 1.2% |
| 198 | 14 | 1.1% |
| 163 | 13 | 1.0% |
| 157 | 12 | 1.0% |
| 54 | 12 | 1.0% |
| 50 | 12 | 1.0% |
| 177 | 12 | 1.0% |
| 59 | 12 | 1.0% |
| 223 | 11 | 0.9% |
| 124 | 10 | 0.8% |
| Other values (369) | 1118 |
| Value | Count | Frequency (%) |
| 20 | 2 | 0.2% |
| 24 | 1 | 0.1% |
| 25 | 3 | |
| 26 | 1 | 0.1% |
| 29 | 2 | 0.2% |
| 30 | 3 | |
| 33 | 1 | 0.1% |
| 35 | 5 | |
| 37 | 1 | 0.1% |
| 38 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 1945 | 1 | |
| 1723 | 1 | |
| 1665 | 1 | |
| 1546 | 1 | |
| 1262 | 1 | |
| 1106 | 1 | |
| 914 | 1 | |
| 900 | 1 | |
| 770 | 1 | |
| 736 | 1 |
| Distinct | 337 |
|---|---|
| Distinct (%) | 27.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 178.4963739 |
| Minimum | 15 |
|---|---|
| Maximum | 765 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 15 |
|---|---|
| 5-th percentile | 44 |
| Q1 | 105 |
| median | 158 |
| Q3 | 228 |
| 95-th percentile | 377 |
| Maximum | 765 |
| Range | 750 |
| Interquartile range (IQR) | 123 |
Descriptive statistics
| Standard deviation | 104.9320142 |
|---|---|
| Coefficient of variation (CV) | 0.5878663635 |
| Kurtosis | 3.034851964 |
| Mean | 178.4963739 |
| Median Absolute Deviation (MAD) | 59 |
| Skewness | 1.339800088 |
| Sum | 221514 |
| Variance | 11010.72761 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 185 | 15 | 1.2% |
| 129 | 14 | 1.1% |
| 50 | 13 | 1.0% |
| 195 | 13 | 1.0% |
| 119 | 13 | 1.0% |
| 157 | 12 | 1.0% |
| 153 | 12 | 1.0% |
| 134 | 12 | 1.0% |
| 105 | 12 | 1.0% |
| 55 | 11 | 0.9% |
| Other values (327) | 1114 |
| Value | Count | Frequency (%) |
| 15 | 1 | 0.1% |
| 20 | 3 | |
| 21 | 1 | 0.1% |
| 24 | 1 | 0.1% |
| 25 | 7 | |
| 26 | 1 | 0.1% |
| 29 | 2 | 0.2% |
| 30 | 5 | |
| 33 | 3 | |
| 34 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 765 | 1 | |
| 750 | 1 | |
| 695 | 1 | |
| 653 | 1 | |
| 644 | 1 | |
| 610 | 1 | |
| 604 | 1 | |
| 599 | 1 | |
| 590 | 1 | |
| 556 | 1 |
| Distinct | 125 |
|---|---|
| Distinct (%) | 10.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.94198227 |
| Minimum | 0 |
|---|---|
| Maximum | 497 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 26 |
| median | 39 |
| Q3 | 53 |
| 95-th percentile | 94 |
| Maximum | 497 |
| Range | 497 |
| Interquartile range (IQR) | 27 |
Descriptive statistics
| Standard deviation | 30.61219216 |
|---|---|
| Coefficient of variation (CV) | 0.6966502323 |
| Kurtosis | 45.84042566 |
| Mean | 43.94198227 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 4.465762074 |
| Sum | 54532 |
| Variance | 937.1063086 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 49 | 3.9% |
| 20 | 44 | 3.5% |
| 44 | 38 | 3.1% |
| 22 | 38 | 3.1% |
| 36 | 37 | 3.0% |
| 42 | 37 | 3.0% |
| 38 | 35 | 2.8% |
| 30 | 35 | 2.8% |
| 48 | 33 | 2.7% |
| 24 | 32 | 2.6% |
| Other values (115) | 863 |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 2 | 1 | 0.1% |
| 4 | 2 | 0.2% |
| 6 | 9 | 0.7% |
| 8 | 14 | |
| 10 | 30 | |
| 12 | 24 | |
| 13 | 3 | 0.2% |
| 14 | 23 | |
| 15 | 2 | 0.2% |
| Value | Count | Frequency (%) |
| 497 | 1 | |
| 276 | 1 | |
| 253 | 1 | |
| 234 | 1 | |
| 206 | 1 | |
| 192 | 1 | |
| 181 | 1 | |
| 178 | 1 | |
| 173 | 1 | |
| 170 | 1 |
| Distinct | 106 |
|---|---|
| Distinct (%) | 8.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.61724416 |
| Minimum | -4 |
|---|---|
| Maximum | 345 |
| Zeros | 5 |
| Zeros (%) | 0.4% |
| Negative | 1 |
| Negative (%) | 0.1% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | -4 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 10 |
| median | 19 |
| Q3 | 34 |
| 95-th percentile | 67 |
| Maximum | 345 |
| Range | 349 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 25.83513473 |
|---|---|
| Coefficient of variation (CV) | 1.008505621 |
| Kurtosis | 37.75728504 |
| Mean | 25.61724416 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 4.455767424 |
| Sum | 31791 |
| Variance | 667.4541863 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 55 | 4.4% |
| 13 | 49 | 3.9% |
| 12 | 44 | 3.5% |
| 10 | 42 | 3.4% |
| 17 | 41 | 3.3% |
| 4 | 38 | 3.1% |
| 9 | 37 | 3.0% |
| 7 | 37 | 3.0% |
| 16 | 35 | 2.8% |
| 5 | 34 | 2.7% |
| Other values (96) | 829 |
| Value | Count | Frequency (%) |
| -4 | 1 | 0.1% |
| 0 | 5 | 0.4% |
| 1 | 20 | 1.6% |
| 2 | 19 | 1.5% |
| 3 | 8 | 0.6% |
| 4 | 38 | |
| 5 | 34 | |
| 6 | 55 | |
| 7 | 37 | |
| 8 | 30 |
| Value | Count | Frequency (%) |
| 345 | 1 | |
| 300 | 1 | |
| 254 | 1 | |
| 210 | 1 | |
| 171 | 1 | |
| 160 | 1 | |
| 154 | 1 | |
| 151 | 1 | |
| 134 | 1 | |
| 123 | 1 |
| Distinct | 192 |
|---|---|
| Distinct (%) | 15.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.99194198 |
| Minimum | 0 |
|---|---|
| Maximum | 820 |
| Zeros | 7 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 17 |
| median | 30 |
| Q3 | 65 |
| 95-th percentile | 155 |
| Maximum | 820 |
| Range | 820 |
| Interquartile range (IQR) | 48 |
Descriptive statistics
| Standard deviation | 61.03354392 |
|---|---|
| Coefficient of variation (CV) | 1.220867634 |
| Kurtosis | 30.73986706 |
| Mean | 49.99194198 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | 4.14940391 |
| Sum | 62040 |
| Variance | 3725.093483 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 20 | 65 | 5.2% |
| 1 | 30 | 2.4% |
| 16 | 28 | 2.3% |
| 18 | 28 | 2.3% |
| 15 | 27 | 2.2% |
| 21 | 26 | 2.1% |
| 9 | 25 | 2.0% |
| 31 | 24 | 1.9% |
| 17 | 24 | 1.9% |
| 19 | 23 | 1.9% |
| Other values (182) | 941 |
| Value | Count | Frequency (%) |
| 0 | 7 | 0.6% |
| 1 | 30 | |
| 2 | 18 | |
| 3 | 17 | |
| 4 | 15 | |
| 5 | 17 | |
| 6 | 8 | 0.6% |
| 7 | 8 | 0.6% |
| 8 | 13 | |
| 9 | 25 |
| Value | Count | Frequency (%) |
| 820 | 1 | |
| 505 | 1 | |
| 474 | 1 | |
| 470 | 1 | |
| 423 | 1 | |
| 403 | 1 | |
| 400 | 1 | |
| 341 | 1 | |
| 321 | 1 | |
| 312 | 2 |
| Distinct | 321 |
|---|---|
| Distinct (%) | 25.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.16253022 |
| Minimum | 0 |
|---|---|
| Maximum | 332 |
| Zeros | 5 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 1.6 |
| median | 3.8 |
| Q3 | 10.1 |
| 95-th percentile | 58 |
| Maximum | 332 |
| Range | 332 |
| Interquartile range (IQR) | 8.5 |
Descriptive statistics
| Standard deviation | 29.67099414 |
|---|---|
| Coefficient of variation (CV) | 2.254201407 |
| Kurtosis | 32.15284121 |
| Mean | 13.16253022 |
| Median Absolute Deviation (MAD) | 2.7 |
| Skewness | 5.043696058 |
| Sum | 16334.7 |
| Variance | 880.3678933 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 39 | 3.1% |
| 2 | 28 | 2.3% |
| 1.8 | 28 | 2.3% |
| 3 | 27 | 2.2% |
| 1.1 | 27 | 2.2% |
| 0.8 | 26 | 2.1% |
| 0.1 | 24 | 1.9% |
| 0.7 | 24 | 1.9% |
| 1.2 | 23 | 1.9% |
| 1.4 | 21 | 1.7% |
| Other values (311) | 974 |
| Value | Count | Frequency (%) |
| 0 | 5 | 0.4% |
| 0.1 | 24 | |
| 0.2 | 7 | 0.6% |
| 0.3 | 8 | 0.6% |
| 0.4 | 12 | |
| 0.5 | 14 | |
| 0.6 | 12 | |
| 0.7 | 24 | |
| 0.8 | 26 | |
| 0.9 | 21 |
| Value | Count | Frequency (%) |
| 332 | 1 | |
| 256 | 2 | |
| 232.1 | 1 | |
| 229 | 1 | |
| 207.2 | 1 | |
| 206.8 | 1 | |
| 196 | 1 | |
| 180.6 | 1 | |
| 180.3 | 1 | |
| 178.6 | 1 |
| Distinct | 380 |
|---|---|
| Distinct (%) | 30.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 215.7993554 |
| Minimum | 18 |
|---|---|
| Maximum | 933 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 55 |
| Q1 | 128 |
| median | 192 |
| Q3 | 275 |
| 95-th percentile | 455 |
| Maximum | 933 |
| Range | 915 |
| Interquartile range (IQR) | 147 |
Descriptive statistics
| Standard deviation | 126.5073791 |
|---|---|
| Coefficient of variation (CV) | 0.5862268629 |
| Kurtosis | 3.12407848 |
| Mean | 215.7993554 |
| Median Absolute Deviation (MAD) | 71 |
| Skewness | 1.352002421 |
| Sum | 267807 |
| Variance | 16004.11697 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 187 | 13 | 1.0% |
| 223 | 13 | 1.0% |
| 238 | 12 | 1.0% |
| 128 | 12 | 1.0% |
| 134 | 12 | 1.0% |
| 154 | 12 | 1.0% |
| 226 | 11 | 0.9% |
| 61 | 11 | 0.9% |
| 67 | 11 | 0.9% |
| 146 | 11 | 0.9% |
| Other values (370) | 1123 |
| Value | Count | Frequency (%) |
| 18 | 1 | 0.1% |
| 24 | 3 | |
| 25 | 1 | 0.1% |
| 29 | 1 | 0.1% |
| 30 | 4 | |
| 31 | 3 | |
| 32 | 1 | 0.1% |
| 35 | 2 | |
| 36 | 3 | |
| 37 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 933 | 1 | |
| 915 | 1 | |
| 848 | 1 | |
| 797 | 1 | |
| 770 | 1 | |
| 744 | 1 | |
| 737 | 1 | |
| 720 | 1 | |
| 670 | 1 | |
| 665 | 1 |
| Distinct | 279 |
|---|---|
| Distinct (%) | 22.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 92.17002417 |
| Minimum | 0 |
|---|---|
| Maximum | 1753 |
| Zeros | 2 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 10 |
| Q1 | 26 |
| median | 55 |
| Q3 | 110 |
| 95-th percentile | 291 |
| Maximum | 1753 |
| Range | 1753 |
| Interquartile range (IQR) | 84 |
Descriptive statistics
| Standard deviation | 127.1443335 |
|---|---|
| Coefficient of variation (CV) | 1.379454271 |
| Kurtosis | 48.13562126 |
| Mean | 92.17002417 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 5.49142232 |
| Sum | 114383 |
| Variance | 16165.68155 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 26 | 32 | 2.6% |
| 17 | 28 | 2.3% |
| 24 | 27 | 2.2% |
| 7 | 26 | 2.1% |
| 10 | 25 | 2.0% |
| 43 | 24 | 1.9% |
| 19 | 23 | 1.9% |
| 55 | 23 | 1.9% |
| 36 | 21 | 1.7% |
| 29 | 20 | 1.6% |
| Other values (269) | 992 |
| Value | Count | Frequency (%) |
| 0 | 2 | 0.2% |
| 2 | 5 | 0.4% |
| 3 | 2 | 0.2% |
| 5 | 12 | |
| 7 | 26 | |
| 8 | 4 | 0.3% |
| 10 | 25 | |
| 11 | 1 | 0.1% |
| 12 | 20 | |
| 13 | 5 | 0.4% |
| Value | Count | Frequency (%) |
| 1753 | 1 | |
| 1410 | 1 | |
| 1314 | 1 | |
| 1065 | 1 | |
| 986 | 1 | |
| 985 | 1 | |
| 758 | 1 | |
| 754 | 1 | |
| 736 | 1 | |
| 707 | 1 |
| Distinct | 115 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.22320709 |
| Minimum | -3 |
|---|---|
| Maximum | 434 |
| Zeros | 82 |
| Zeros (%) | 6.6% |
| Negative | 1 |
| Negative (%) | 0.1% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | -3 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4 |
| median | 17 |
| Q3 | 38 |
| 95-th percentile | 82 |
| Maximum | 434 |
| Range | 437 |
| Interquartile range (IQR) | 34 |
Descriptive statistics
| Standard deviation | 30.66201248 |
|---|---|
| Coefficient of variation (CV) | 1.169270119 |
| Kurtosis | 29.43368742 |
| Mean | 26.22320709 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 3.466286451 |
| Sum | 32543 |
| Variance | 940.1590094 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 82 | 6.6% |
| 1 | 78 | 6.3% |
| 3 | 58 | 4.7% |
| 4 | 51 | 4.1% |
| 2 | 44 | 3.5% |
| 5 | 43 | 3.5% |
| 11 | 29 | 2.3% |
| 9 | 29 | 2.3% |
| 10 | 26 | 2.1% |
| 18 | 25 | 2.0% |
| Other values (105) | 776 |
| Value | Count | Frequency (%) |
| -3 | 1 | 0.1% |
| 0 | 82 | |
| 1 | 78 | |
| 2 | 44 | |
| 3 | 58 | |
| 4 | 51 | |
| 5 | 43 | |
| 6 | 23 | 1.9% |
| 7 | 25 | 2.0% |
| 8 | 21 | 1.7% |
| Value | Count | Frequency (%) |
| 434 | 1 | 0.1% |
| 250 | 1 | 0.1% |
| 210 | 1 | 0.1% |
| 186 | 1 | 0.1% |
| 160 | 1 | 0.1% |
| 152 | 1 | 0.1% |
| 145 | 1 | 0.1% |
| 142 | 3 | |
| 138 | 1 | 0.1% |
| 135 | 1 | 0.1% |
| Distinct | 154 |
|---|---|
| Distinct (%) | 12.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3985173247 |
| Minimum | 0.02 |
|---|---|
| Maximum | 3.94 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 0.02 |
|---|---|
| 5-th percentile | 0.07 |
| Q1 | 0.16 |
| median | 0.27 |
| Q3 | 0.47 |
| 95-th percentile | 1.2 |
| Maximum | 3.94 |
| Range | 3.92 |
| Interquartile range (IQR) | 0.31 |
Descriptive statistics
| Standard deviation | 0.419843934 |
|---|---|
| Coefficient of variation (CV) | 1.053514886 |
| Kurtosis | 16.8172457 |
| Mean | 0.3985173247 |
| Median Absolute Deviation (MAD) | 0.13 |
| Skewness | 3.372375932 |
| Sum | 494.56 |
| Variance | 0.1762689289 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.11 | 42 | 3.4% |
| 0.18 | 35 | 2.8% |
| 0.13 | 35 | 2.8% |
| 0.29 | 35 | 2.8% |
| 0.16 | 34 | 2.7% |
| 0.21 | 34 | 2.7% |
| 0.14 | 32 | 2.6% |
| 0.12 | 32 | 2.6% |
| 0.17 | 31 | 2.5% |
| 0.19 | 31 | 2.5% |
| Other values (144) | 900 |
| Value | Count | Frequency (%) |
| 0.02 | 1 | 0.1% |
| 0.03 | 3 | 0.2% |
| 0.04 | 11 | 0.9% |
| 0.05 | 9 | 0.7% |
| 0.06 | 21 | |
| 0.07 | 20 | |
| 0.08 | 16 | 1.3% |
| 0.09 | 23 | |
| 0.1 | 28 | |
| 0.11 | 42 |
| Value | Count | Frequency (%) |
| 3.94 | 1 | |
| 3.6 | 1 | |
| 3.54 | 1 | |
| 3.32 | 1 | |
| 3.1 | 1 | |
| 3.08 | 1 | |
| 2.91 | 1 | |
| 2.5 | 1 | |
| 2.49 | 1 | |
| 2.41 | 1 |
is_drinkable
Categorical
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1241 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1241 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1241 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1241 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 656 | |
| 0 | 585 |
| Distinct | 1241 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.52514704 |
| Minimum | 2.92080857 |
|---|---|
| Maximum | 271.4228748 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.8 KiB |
Quantile statistics
| Minimum | 2.92080857 |
|---|---|
| 5-th percentile | 10.69497045 |
| Q1 | 20.47250591 |
| median | 30.77209109 |
| Q3 | 47.56704838 |
| 95-th percentile | 96.1546608 |
| Maximum | 271.4228748 |
| Range | 268.5020663 |
| Interquartile range (IQR) | 27.09454248 |
Descriptive statistics
| Standard deviation | 30.85646353 |
|---|---|
| Coefficient of variation (CV) | 0.780679285 |
| Kurtosis | 10.00919359 |
| Mean | 39.52514704 |
| Median Absolute Deviation (MAD) | 11.86706255 |
| Skewness | 2.637476657 |
| Sum | 49050.70747 |
| Variance | 952.1213414 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 19.03235114 | 1 | 0.1% |
| 23.94772587 | 1 | 0.1% |
| 14.74641444 | 1 | 0.1% |
| 14.93827374 | 1 | 0.1% |
| 17.39788878 | 1 | 0.1% |
| 25.75117264 | 1 | 0.1% |
| 29.4523277 | 1 | 0.1% |
| 16.68437973 | 1 | 0.1% |
| 26.14416028 | 1 | 0.1% |
| 29.9555589 | 1 | 0.1% |
| Other values (1231) | 1231 |
| Value | Count | Frequency (%) |
| 2.92080857 | 1 | |
| 4.741497468 | 1 | |
| 4.788191687 | 1 | |
| 5.125883257 | 1 | |
| 5.464181808 | 1 | |
| 5.590406731 | 1 | |
| 5.907261341 | 1 | |
| 5.911321231 | 1 | |
| 5.955143778 | 1 | |
| 5.963774771 | 1 |
| Value | Count | Frequency (%) |
| 271.4228748 | 1 | |
| 225.7888095 | 1 | |
| 222.6577585 | 1 | |
| 215.1332868 | 1 | |
| 205.0670827 | 1 | |
| 204.8788207 | 1 | |
| 203.5143277 | 1 | |
| 191.1416121 | 1 | |
| 183.8170026 | 1 | |
| 183.0700216 | 1 |
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.8 KiB |
| Good | |
|---|---|
| Excellent | |
| Poor | |
| Very Poor |
Length
| Max length | 9 |
|---|---|
| Median length | 4 |
| Mean length | 6.236099919 |
| Min length | 4 |
Characters and Unicode
| Total characters | 7739 |
|---|---|
| Distinct characters | 15 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Excellent |
|---|---|
| 2nd row | Excellent |
| 3rd row | Very Poor |
| 4th row | Good |
| 5th row | Excellent |
Common Values
| Value | Count | Frequency (%) |
| Good | 513 | |
| Excellent | 436 | |
| Poor | 173 | 13.9% |
| Very Poor | 119 | 9.6% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| good | 513 | |
| excellent | 436 | |
| poor | 292 | |
| very | 119 | 8.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 1610 | |
| e | 991 | |
| l | 872 | |
| G | 513 | 6.6% |
| d | 513 | 6.6% |
| E | 436 | 5.6% |
| x | 436 | 5.6% |
| c | 436 | 5.6% |
| n | 436 | 5.6% |
| t | 436 | 5.6% |
| Other values (5) | 1060 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6260 | |
| Uppercase Letter | 1360 | 17.6% |
| Space Separator | 119 | 1.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 1610 | |
| e | 991 | |
| l | 872 | |
| d | 513 | 8.2% |
| x | 436 | 7.0% |
| c | 436 | 7.0% |
| n | 436 | 7.0% |
| t | 436 | 7.0% |
| r | 411 | 6.6% |
| y | 119 | 1.9% |
Uppercase Letter
| Value | Count | Frequency (%) |
| G | 513 | |
| E | 436 | |
| P | 292 | |
| V | 119 | 8.8% |
Space Separator
| Value | Count | Frequency (%) |
| 119 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7620 | |
| Common | 119 | 1.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 1610 | |
| e | 991 | |
| l | 872 | |
| G | 513 | 6.7% |
| d | 513 | 6.7% |
| E | 436 | 5.7% |
| x | 436 | 5.7% |
| c | 436 | 5.7% |
| n | 436 | 5.7% |
| t | 436 | 5.7% |
| Other values (4) | 941 |
Common
| Value | Count | Frequency (%) |
| 119 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7739 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 1610 | |
| e | 991 | |
| l | 872 | |
| G | 513 | 6.6% |
| d | 513 | 6.6% |
| E | 436 | 5.6% |
| x | 436 | 5.6% |
| c | 436 | 5.6% |
| n | 436 | 5.6% |
| t | 436 | 5.6% |
| Other values (5) | 1060 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| District | Village | pH | EC | TDS | TH | Alkalinity | Calcium | Magnesium | Sodium | Potassium | Bicarbonate | Chloride | Sulphate | Fluoride | is_drinkable | WQI | WQC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Angul | Chauliakata | 7.22 | 210.0 | 105 | 85 | 60 | 22 | 7 | 5 | 4.2 | 73 | 21 | 10 | 0.27 | 1 | 19.032351 | Excellent |
| 1 | Angul | Godibandha | 7.54 | 310.0 | 157 | 100 | 85 | 20 | 12 | 21 | 4.8 | 104 | 43 | 5 | 0.12 | 1 | 15.589984 | Excellent |
| 2 | Angul | Samal | 8.08 | 580.0 | 282 | 125 | 200 | 20 | 18 | 72 | 3.5 | 244 | 21 | 28 | 1.52 | 0 | 86.264578 | Very Poor |
| 3 | Angul | Sipur | 8.25 | 390.0 | 191 | 150 | 145 | 34 | 16 | 16 | 5.7 | 177 | 30 | 3 | 0.31 | 1 | 31.931635 | Good |
| 4 | Angul | Khamar-1 | 7.64 | 460.0 | 234 | 165 | 125 | 42 | 15 | 26 | 5.0 | 153 | 71 | 0 | 0.15 | 1 | 18.932642 | Excellent |
| 5 | Angul | Srirampur | 7.88 | 390.0 | 196 | 160 | 105 | 36 | 17 | 15 | 1.2 | 128 | 60 | 4 | 0.27 | 1 | 23.675539 | Excellent |
| 6 | Angul | Pallahara | 7.99 | 480.0 | 244 | 145 | 110 | 34 | 15 | 41 | 1.8 | 134 | 72 | 15 | 0.11 | 1 | 17.458106 | Excellent |
| 7 | Angul | Jamardihi | 7.62 | 90.0 | 43 | 40 | 25 | 10 | 4 | 1 | 0.7 | 31 | 10 | 2 | 0.13 | 1 | 12.470002 | Excellent |
| 8 | Angul | Sendhogram | 7.81 | 820.0 | 428 | 200 | 245 | 24 | 34 | 95 | 2.2 | 299 | 82 | 44 | 1.59 | 0 | 88.117828 | Very Poor |
| 9 | Angul | Bhogabereni | 7.42 | 2440.0 | 1292 | 515 | 425 | 78 | 78 | 312 | 20.6 | 519 | 363 | 186 | 0.94 | 0 | 74.916497 | Poor |
Last rows
| District | Village | pH | EC | TDS | TH | Alkalinity | Calcium | Magnesium | Sodium | Potassium | Bicarbonate | Chloride | Sulphate | Fluoride | is_drinkable | WQI | WQC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1231 | Sundargarh | R-27 Sector-7 | 8.26 | 270.0 | 138 | 113 | 98 | 29 | 10 | 9 | 1.9 | 120 | 24 | 5 | 0.27 | 1 | 26.533711 | Good |
| 1232 | Sundargarh | R-29 Sector-9 | 8.28 | 260.0 | 128 | 108 | 110 | 24 | 12 | 9 | 1.2 | 134 | 12 | 4 | 0.27 | 1 | 26.241266 | Good |
| 1233 | Sundargarh | R-30 Sector-13 | 8.11 | 190.0 | 100 | 78 | 55 | 26 | 3 | 6 | 2.9 | 67 | 22 | 7 | 0.18 | 1 | 20.994659 | Excellent |
| 1234 | Sundargarh | R-31 Sector-14 | 8.20 | 170.0 | 89 | 69 | 71 | 22 | 3 | 5 | 3.7 | 87 | 10 | 2 | 0.30 | 1 | 27.922359 | Good |
| 1235 | Sundargarh | R-32 Sector-20 | 8.27 | 330.0 | 168 | 113 | 93 | 31 | 9 | 21 | 4.8 | 114 | 31 | 16 | 0.18 | 1 | 24.550178 | Excellent |
| 1236 | Sundargarh | R-33 Sector 18 | 8.22 | 340.0 | 176 | 93 | 82 | 31 | 4 | 31 | 4.4 | 100 | 34 | 23 | 0.14 | 1 | 21.589240 | Excellent |
| 1237 | Sundargarh | R-34 Sector-17 | 7.90 | 240.0 | 118 | 93 | 49 | 18 | 12 | 8 | 4.5 | 60 | 41 | 5 | 0.14 | 1 | 19.066396 | Excellent |
| 1238 | Sundargarh | R-36 Sector-15 | 8.27 | 340.0 | 156 | 147 | 131 | 29 | 18 | 9 | 2.2 | 160 | 14 | 5 | 0.26 | 1 | 27.092631 | Good |
| 1239 | Sundargarh | R-37 Vedvyas | 8.26 | 740.0 | 378 | 270 | 126 | 77 | 19 | 44 | 1.5 | 154 | 156 | 5 | 0.21 | 0 | 25.709087 | Good |
| 1240 | Sundargarh | R-38 Kalunga | 8.24 | 340.0 | 173 | 152 | 77 | 51 | 6 | 6 | 2.4 | 94 | 53 | 9 | 0.23 | 1 | 25.023410 | Good |